Solving Batched Linear Programs on GPU and Multicore CPU

نویسندگان

Amit Gurung

Rajarshi Ray

چکیده

Linear Programs (LPs) appear in a large number of applications and offloading them to the GPU is viable to gain performance. Existing work on offloading and solving an LP on GPU suggests that performance is gained from large sized LPs (typically 500 constraints, 500 variables and above). In order to gain performance from GPU for applications involving small to medium sized LPs, we propose batched solving of a large number of LPs in parallel. In this paper, we present the design and CUDA implementation of our batched LP solver library, keeping memory coalescent access, reduced CPU-GPU memory transfer latency and load balancing as the goals. The performance of the batched LP solver is compared against sequential solving in the CPU using an open source solver GLPK (GNU Linear Programming Kit). The performance is evaluated for three types of LPs. The first type is with the initial basic solution as feasible, the second type is with the initial basic solution as infeasible and the third type is with the feasible region as a Hyperbox. For the first type, we show a maximum speedup of 18.3× when running a batch of 50k LPs of size 100 (100 variables, 100 constraints). For the second type, a maximum speedup of 12× is obtained with a batch of 10k LPs of size 200. For the third type, we show a significant speedup of 63× in solving a batch of nearly 4 million LPs of size 5 and 34× in solving 6 million LPs of size 28. In addition, we show that the ∗Corresponding author Email addresses: [email protected] (Amit Gurung*), [email protected] (Rajarshi Ray) Preprint submitted to Journal of LTEX Templates September 27, 2016 ar X iv :1 60 9. 08 11 4v 1 [ cs .D C ] 2 6 Se p 20 16 open source library for solving linear programs-GLPK, can be easily extended to solve many LPs in parallel with multi-threading. The thread parallel GLPK implementation runs 9.6× faster in solving a batch of 1e5 LPs of size 100, on a 12-core Intel Xeon processor. We demonstrate the application of our batched LP solver in the domain of state-space exploration of mathematical models of control systems design.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Simultaneous Solving of Batched Linear Programs on a GPU

Present Address Department of Computer Science & Engineering, National Institute of Technology Meghalaya, Shillong 793003, India Abstract Linear Programs (LPs) appear in a large number of applications and offloading them to a GPU is viable to gain performance. Existing work on offloading and solving an LP on a GPU suggests that there is performance gain generally on large sized LPs (typically 5...

متن کامل

A Framework for Batched and GPU-Resident Factorization Algorithms Applied to Block Householder Transformations

As modern hardware keeps evolving, an increasingly effective approach to develop energy efficient and high-performance solvers is to design them to work on many small size and independent problems. Many applications already need this functionality, especially for GPUs, which are currently known to be about four to five times more energy efficient than multicore CPUs. We describe the development...

متن کامل

Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs

Solving a large number of relatively small linear systems has recently drawn more attention in the HPC community, due to the importance of such computational workloads in many scientific applications, including sparse multifrontal solvers. Modern hardware accelerators and their architecture require a set of optimization techniques that are very different from the ones used in solving one relati...

متن کامل

Batched matrix computations on hardware accelerators based on GPUs

Scientific applications require solvers that work on many small size problems that are independent from each other. At the same time, the high-end hardware evolves rapidly and becomes ever more throughput-oriented and thus there is an increasing need for an effective approach to develop energy-efficient, high-performance codes for these small matrix problems that we call batched factorizations....

متن کامل

Simplex Parallelization in a Fully Hybrid Hardware Platform

The simplex method has been successfully used in solving linear programming (LP) problems for many years. Parallel approaches have also extensively been studied due to the intensive computations required, especially for the solution of large LP problems. Furthermore, the rapid proliferation of multicore CPU architectures as well as the computational power provided by the massive parallelism of ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1609.08114 شماره

صفحات -

تاریخ انتشار 2016

Solving Batched Linear Programs on GPU and Multicore CPU

نویسندگان

چکیده

منابع مشابه

Simultaneous Solving of Batched Linear Programs on a GPU

A Framework for Batched and GPU-Resident Factorization Algorithms Applied to Block Householder Transformations

Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs

Batched matrix computations on hardware accelerators based on GPUs

Simplex Parallelization in a Fully Hybrid Hardware Platform

عنوان ژورنال:

اشتراک گذاری